persist and recover shard states. #41

raulk · 2021-07-05T23:26:22Z

Closes #32.
Closes #24.
Closes #13.

This PR fleshes out another important part of the system: shard state persistence and resumption upon restart, and it also rewrites the mount registry/factory abstraction.

Persisting shard state

On every iteration of the event loop, we persist the current shard state. This is persisted in a datastore.Datastore provided by the user. We assume it to be namespaced.

Shards are persisted under their shard key, and the state is serialised as JSON. There is unused code for switching to CBOR in the future, if we find that to be more performant.

We can put on extra smarts to avoid persisting when unnecessary (e.g. when the persisted entity would be identical than the last one written). We can easily do this because PersistentShard is a comparable struct, so we could retain the last written PersistentShard in the shard state, calculate the new one, and skip the datastore put if they're identical. TODO in #44.

We also need to clean up the serialization logic. I don't like MarshalJSON and UnmarshalJSON doing this much.

Resuming the DAG store

On start, we restore the state from the Datastore. We iterate and recover all shards, overwriting some interim states:

ShardStateServing is reset to ShardStateAvailable, as when we start, there is guaranteed to be 0 active consumers.
ShardStateInitializing is reset to ShardStateNew, and we restart the shard initialization by dispatching tasks to the event loop.

Mount registry and factory

I revisited the mount.Registry and mount.MountFactory design. This has been simplified by removing the latter. Instead, we use a Mount instance as a template, and we clone it every time we instantiate a new mount of that type.

This allows us to set "environmental" properties that get applied automatically to all mounts of that type, as for example a JSON-RPC endpoint in the Lotus mount:

type LotusMount string {
    Endpoint string
}

var _ mount.Mount = (*LotusMount)(nil) // assume it implements the interface

r := mount.NewRegistry()
r.Register("lotus", &LotusMount{Endpoint: "ws://host:port/rpc/v0"})

m, _ = r.Instantiate(&url.URL{Scheme: "lotus", ...}) // m will have Endpoint set

An example of this is used in the tests with the new mount.FSMount, which I had to replace mount.BytesMount with for testing, since we can't afford to serialize full CAR files into base64 when encoding URLs every time that we persist shards.

Finally, the new registry design allows us to register the same mount type under many different schemes, with different templates. This is important if, for example, you have various fs.FS you want to serve from (for testing), or you want to cater to more than one Lotus deployment.

Since we do type matching, you do need to create new types when doing that:

type LotusMount1 struct { *LotusMount }
type LotusMount2 struct { *LotusMount }
type LotusMount3 struct { *LotusMount }

r := mount.NewRegistry()
r.Register("lotus1", &LotusMount1{&LotusMount{Endpoint: "ws://host1:port1/rpc/v0"}})
r.Register("lotus2", &LotusMount2{&LotusMount{Endpoint: "ws://host2:port2/rpc/v0"}})
r.Register("lotus3", &LotusMount3{&LotusMount{Endpoint: "ws://host3:port3/rpc/v0"}})

Avoid using the BytesMount in tests, as it encodes the bytes as base64 in the URL.

aarshkshah1992

Great stuff ! Just some nits.

Needs some solid tests.
I'm worried about the fsm Idempotency after resetting the state to ShardStateNew and ensuring we don't end up with more than one transient if we'd already fetched a copy before going down and restarting but I'll write a test for that.

shard_persist.go

aarshkshah1992 · 2021-07-07T13:44:44Z

shard_persist.go

+		return fmt.Errorf("failed to apply mount upgrader: %w", err)
+	}
+
+	s.indexed = ps.Indexed


I think rather than keeping a track of this indexed field across restarts/or even at all, we can always ask the IndexRepo if it has the Index we need. That'll make it one less mutable state to update/track/keep in sync with the IndexRepo.

Yeah, that simplification is worth it. We can use StatFullIndex to check if the index exists.

aarshkshah1992 · 2021-07-07T13:47:15Z

shard.go

@@ -40,5 +41,5 @@ type Shard struct {
 	wAcquire  []*waiter
 	wDestroy  *waiter

-	refs uint32 // count of DAG accessors currently open
+	refs uint32 // number of DAG accessors currently open


nit: would be nice to have a comment here about which of these fields are persisted and which aren't.

mount/upgrader.go

persist shard states.

a9a83ba

This was linked to issues Jul 5, 2021

upgrader: expose the path of the transient #24

Closed

implement shard state persistence and recovery upon restart #32

Closed

Base automatically changed from raulk/all-shards-info to master July 6, 2021 11:34

raulk added 3 commits July 6, 2021 13:51

introduce mount.FsMount that uses a fs.FS.

ca541ca

Avoid using the BytesMount in tests, as it encodes the bytes as base64 in the URL.

renames and docs for consistency.

dd5deb6

rearchitect the mount registry and integrate it on persist/resume.

173a84e

raulk changed the title ~~persist shard states.~~ persist and recover shard states. Jul 6, 2021

raulk requested review from aarshkshah1992 and dirkmc July 6, 2021 23:24

raulk mentioned this pull request Jul 6, 2021

skip persisting shard state when unnecessary #44

Open

raulk added 2 commits July 7, 2021 09:56

remove mount.Type/Factory abstraction.

2ded9b1

Merge branch 'master' into raulk/persist

96a370d

aarshkshah1992 approved these changes Jul 7, 2021

View reviewed changes

address review comments; fix compile error.

31b64b8

raulk marked this pull request as ready for review July 7, 2021 21:08

raulk merged commit 649a9e7 into master Jul 7, 2021

raulk deleted the raulk/persist branch July 7, 2021 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

persist and recover shard states. #41

persist and recover shard states. #41

raulk commented Jul 5, 2021 •

edited

Loading

aarshkshah1992 left a comment •

edited

Loading

aarshkshah1992 Jul 7, 2021

raulk Jul 7, 2021

raulk Jul 7, 2021

aarshkshah1992 Jul 7, 2021

raulk Jul 7, 2021

persist and recover shard states. #41

persist and recover shard states. #41

Conversation

raulk commented Jul 5, 2021 • edited Loading

Persisting shard state

Resuming the DAG store

Mount registry and factory

aarshkshah1992 left a comment • edited Loading

Choose a reason for hiding this comment

aarshkshah1992 Jul 7, 2021

Choose a reason for hiding this comment

raulk Jul 7, 2021

Choose a reason for hiding this comment

raulk Jul 7, 2021

Choose a reason for hiding this comment

aarshkshah1992 Jul 7, 2021

Choose a reason for hiding this comment

raulk Jul 7, 2021

Choose a reason for hiding this comment

raulk commented Jul 5, 2021 •

edited

Loading

aarshkshah1992 left a comment •

edited

Loading